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CLEAN COPY OF AMENDED PORTIONS OF THE SPECIFICATION 
Page 2, line 30 to page 3, line3 

MurG is the last enzyme involved in the intracellular phase of peptidoglycan synthesis (Bugg 
& Walsh, 1993). It catalyzes the transfer of N-acetyl glucosamine (NAG) from UDP to the 
C4 hydroxyl of a lipid-linked N-acetylmuramoyl pentapeptide (NAM) to form a p-linked 
NAG-NAM disaccharide that is transported across the cell membrane where it is polymerized 
and cross-linked (Fig. 1). In bacterial cells MurG associates with the cytoplasmic surface of 
the membrane (Bupp & van Heijenoort, 1993). However, we have found that E. coli MurG 
can be solubilized at high concentrations in active form (Ha et al., 1999). 

Page 8, line 23 to page 9, line 4 

FIG 4. Structural analysis of the substrate binding pockets in MurG A. Structural 
comparison between the C-terminal domain of phage T4 p-glucosyltransferase (left) and the 
C-terminal domain of E. coli MurG (right). The aligned six p-strands are magenta, the 
aligned a-helices are orange, and the other structural elements are blue. In p- 
glucosyltransferase, key residues involved in UDP binding are highlighted in yellow. The 
analogous residues in MurG are also highlighted in yellow. B. A close-up view of the 
proposed donor binding pocket in the MurG C domain with the docked UDP-GlcNAc, 
Conserved residues in MurG are colored magenta. The carbonyl oxygen of residue 1245 is 
shown in red, and its backbone nitrogen is shown in blue. C. The surface of E. coli MurG. 
The G loops and other conserved residues in MurG are colored magenta. The proposed 
membrane binding interface is also highlighted with hydrophobic residues in yellow and 
positively charged residues in blue. 
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Page 186, line 8 to page 187, line 6 

This example describes the crystallization of the E. coli MurG protein and the 
determination of the coordinates of the three-dimensional crystal structure. This example 
also describes the identification of the donor nucleotide binding site, the acceptor binding 
site and the membrane association site of the MurG protein. 

Methods 

Crystallization 

E. coli MurG containing a C-terminal LEHHHHHH sequence was purified as described (Ha 
et ai $ 1999) and concentrated to 10 mg ml" 1 in 20 mM Tris-HCl, pH 7.9/ 1 50 mM NaCl/ 50 
mM EDTA. The protein concentrate was mixed with UDP-CHcNAc in a 1:3 molar ratio. 
Crystals were grown at room temperature using the hanging-drop vapor-diffusion method 
by mixing equal volumes of protein with reservoir solution (0.1 M NaMES, pH 6.5/ 0.96 M 
(NH4) 2 SO4/0.4% Triton X-100/ 10 mM DTT), Triclinic crystals with a typical size of 0.2 
mm X 0.1 mm X 0.1 mm grew within a week. The crystals belong to the PI space group, 
with two molecules per asymmetric unit. The cell dimensions are a = 60.613 A, b = 6.356 
A, = 67.902 A, a=64.294, p=83.520, y = 65.448. 
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Page 188 lines 9-36 



Table 1. Summary of crystallographic and refinement data 



Data set 
Resolution (A) 
Observations 
Unique reflections 

Rsym 1 (last shell) 
I/a (last shell) 
Completeness 
(78.6%) 
(last shell) 



Native 
1.9 
288,150 
65,567 

0.032(0.187) 
41.9(7.0) 
97.7% (96.4%) 



MIR analysis ( 40.0 - 2.5 A) 
Mean isomorphous difference 2 
Phasing power 3 (last shell) 
R^ 4 (last shell) 
Anomalous Rc^ 4 (last shell) 
Refinement statistics 
Resolution 40.0- 1.9 A 

Reflections ( |f| >2<j) 61,989 
Protein atoms (a. u.) 5,280 
Water Atoms 298 
Sulfate groups 1 
R-factor- 5 22.0% 
94.6% 

R-free 6 24.7% 
5.4% 



HgCl 2 (form A derivative) 
2.0 
101,913 
53,391 
0.043 (0.200) 

20.4(2.9) 
91.4% (66.6%) 



0.163 
1.09(0.73) 
0.81 (0.91) 
0.96(1.00) 



R. m. s. d. 7 
Bonds (A) 
AnglesT) 



HgCI 2 (form B derivative) (NH4)2WS4 (NH^OsBre 
1.9 2.4 2.3 

245,320 44,366 106,606 

65,581 27,950 36,443 

0.042 (0.296) 0.031 (0.080) 0.056 (0.302) 
29.0(3.7) 24.6(8.2) 19.6(2.5) 

97.4% (94.0%) 83.8% (62.0%) 94.3% 



0.130 
0.57(0.50) 
0.94(0.96) 
0.95(1.00) 



0.068 0.134 
0.61(0.24) 0.61(0.58) 
0.92(0.99) 0.94(0.95) 



0.006 
1.29 



Ramachandran plot 8 

Residues in most favored region 

Residues in additional allowed region 



1 R syin = I | Ij - <I> | / Eli, where I* is the intensity of a reflection, and <I> is the average intensity of that 
reflection. 



Mean isomorphous difference - S |Fph-F p |/IF P h, where F PH and F P are the derivative and native structure 
factors respectively. 

3 Phasing power is the ratio of the mean calculated derivative structure factor to the mean lack of closure 
error. 

4 Rcuiiis is the mean residual lack of closure error divided by the dispersive or anomalous difference. 
5 R-factor = S|Fobs| - |Fcalc| | /E| F| 

^-free is the R-factor calculated using 10% of the reflection data chosen randomly and omitted from the 
start of refinement. 

7 R. m. s. d., root-mean-square deviations from ideal bond lengths and bond angles. 
Calculated with program PROCHECK. 
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Page 189, line 1 to page 191, line 26 

The structure consists of two domains separated by a deep cleft (Fig. 2a). Both domains 
exhibit an a/p open-sheet structure and have high structural homology despite minimal 
sequence homology (RMSD = 2.02 over 85 aligned Ca atoms). The N-domain includes 
residues 7-163 and 341-357, and contains seven parallel p-strands and six a-helices, the last 
of which originates in the C-domain (Fig. 2b). The C-domain comprises residues 164-340 
and contains six parallel p-strands and eight a-helices, including one irregular bipartite 
helix (a-link) that connects the N-domain to the first p-strand of the C-domain. The 
p-strands in both domains are ordered as for a typical Rossman fold. The N- and C-domains 
are joined by a short linker between the seventh p-strand of the N-domain and the a-link of 
the C-domain. This inter-domain linker and the peptide segment that joins the last helix of 
the C-domain to the last helix of the N-domain define the floor of the cleft between the two 
domains. The cleft itself is about 20 A deep and 18 A across at its widest point. Contacts < 4 
A across the cleft are limited primarily to interactions between residues from C-a5 to the 
loop connecting N-P5 to N-a5. 

The a/p open-sheet motif (Rossman fold) adopted by both the N- and C-domains of MurG 
is characteristic of domains that bind nucleotides (Branden & Tooze, 1998). Classical 
Rossman domains typically contain at least one conserved glycine rich motif, with the 
consensus sequence GXGXXG, located at a turn between the carboxyl end of one p-strand 
and the amino terminus of the adjacent a-helix (Baker et al., 1992). This motif is involved 
in binding the negatively charged phosphates (Carugo & Argos, 1997). There are three 
glycine rich loops (G loops) in E. coli MurG (Fig. 3a) that may be variants on the phosphate 
binding loops found in other dinucleotide binding proteins (see below). 

Sequence homology 

Amino acid sequences for eighteen MurG homologs are now available. The sequence 
similarity between E. coli MurG and homologs from other bacterial strains ranges from less 
than 30% to more than 90% depending on the evolutionary relationship between the 
organisms. In all MurG homologs, however, there are several invariant residues. Fig. 3 a 
shows a sequence alignment for a subset of MurG homologs with the invariant and highly 
conserved residues indicated. These residues, which include the three G loops, have been 
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highlighted in the E. coli MurG structure (Fig. 3b). Almost all of the invariant residues are 
located at or near the cleft between the two domains. Two of the G loops are found in the N 
domain (between N-pi/N-al and N-p4/N-ct4) and one is found in the C-domain (between 
C-pl/C-cd). The strict conservation of the highlighted residues among different bacterial 
strains, and their location as determined from the E. coli MurG structure, implicates them in 
substrate binding and catalytic activity 

Structural homology reveals the donor binding site 

The three-dimensional backbone structure of E. coli MurG was compared to known protein 
structures, including the three other NDP-glycosyltransferase structures that have been 
reported (Chamok & Davies, 1999; Gastinel et al, 1999; Vrielink et aL, 1994/ The 
C-terminal domain was found to have significant structural homology (RMSD= 2.218 A for 
89 aligned Cot atoms) to the C-terminal domain of phage T4 p-glucosyltransferase (BGT), 
an enzyme that catalyzes the glucosylation of hydroxymethyl-cytosines in duplex DNA. A 
co-crystal structure of BGT with UDP bound in the C-terminal domain reveals the topology 
of the UDP binding pocket and also shows important contacts to the nucleotide (Morera et 
al, 1999; Vrielink et al y 1994). These contacts include: a) hydrogen bonds from the 
backbone amide of 1238 to the N3 and 04 positions of the base; b) hydrogen bonds between 
the carboxyl side chain of E272 and the 02' and 03* hydroxyls of the ribose ring; and c) 
contacts from a GGS motif in the loop following the first P-strand of the C domain to the 
alpha phosphate of UDP. The structurally homologous C-domain of MurG contains a 
topological^ similar pocket (Fig. 4a). Furthermore, even though the two domains share only 
11% sequence identity overall, there are identical residues in the same spatial location in E. 
coli MurG and in BGT. Based on this comparison, we have concluded that the C-domain of 
E. coli MurG is the UDP-GlcNAc binding site. 

We have docked UDP-GlcNAc into the C-domain of E. coli MurG using the information on 
how UDP binds to BGT as a guide. As shown in Figure 4b, the uracil is held in place by 
contacts from the N3 and 04 atoms to the backbone amide of 1245. The 02' and 03' 
hydroxyls on the ribose sugar are within hydrogen bonding distance of the invariant 
glutamate residue (E269) in the middle of helix C-a4. The conserved GGS motif in G loop 
3 is positioned to contact the alpha phosphate. When these contacts are made, the 
UDP-GlcNAc substrate fits nicely into a pocket in the C-domain, where it is surrounded by 
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many of the invariant residues identified through sequence analysis of other MurG 
homologs. It is possible to propose roles for some of these invariant residues from the 
model. For example, the side chain of R261 can be rotated to contact the second phosphate; 
this contact may help explain why UDP binds significantly better to MurG than UMP. We 
propose that R261 plays an important role in catalysis by stabilizing the UDP leaving group 
via electrostatic interactions. The side chain of Q289 is within hydrogen bonding distance 
of the C4 hydroxyl of the GlcNAc sugar. This contact may explain why MurG can 
discriminate between UDP-GlcNAc and its C4 axial isomer UDP-GalNAc (Ha et al , 1 999). 

The acceptor binding site 

Structural considerations suggest that the primary acceptor binding site is located in the 
N-terminal domain of MurG. This domain contains three highly conserved regions, two of 
which are glycine-rich loops that face the cleft (Fig 3 a and 4c). These G loops are 
reminiscent of the phosphate binding loops found in other nucleotide binding proteins, and 
are most likely involved in binding to the diphosphate on Lipid I. The N-termini of the 
helices following each G loop form opposite walls of a small pocket between the G loops. 
The helix dipoles create a positively charged electrostatic field in the pocket that can stabilize 
the negative charged diphosphates. When the diphosphate of the acceptor is anchored in the 
pocket created by the G-loops, the MurNAc sugar emerges into the cleft between domains 
and the C4 hydroxyl can be directed towards the anomeric carbon of the GlcNAc for attack 
on the face opposite the UDP leaving group. The third conserved region in the N domain 
spans the loop from the end of N-p5 to the middle of N-a5. Kinetic analysis of mutants is 
required to evaluate the roles of these residues (Ha et a/., 1999; Men et al, 1998). 
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Page 192, lines 21 to page 193, line 5 

In addition to this structural homology, we have identified a strikingly similar sequence motif 
in the MurG family and certain other UDP-glycosyltransferase families. This sequence motif 
spans about a thirty amino acid stretch in the C-domain of MurG and includes most of the 
invariant residues found in that domain. As shown in Figure 3a, a similar motif is found in 
the UDP-glucuronosyltransferases (Mackenzie, 1990). Certain residues are identical, 
including a number of prolines and glycines, and the spacing between them is invariant. This 
suggests that the UDP-glucuronosyltransferases contain a region of ct/p supersecondary 
structure that is involved in a similar function as the corresponding region in MurG (Fig. 3 c). 
This region binds the donor sugar. By analyzing the similarities and differences between the 
conserved residues in this subdomain in the MurG family and other UDP-glycosyltransferase 
families, it may be possible to identify - and perhaps alter - residues that are involved in 
determining donor selectivity. We note that it would be useful to be able to manipulate donor 
specificity because it would extend the utility of glycosyltransferases as reagents for 
glycosylation of complex molecules. Altered glycosyltransferases could also be useful for 
remodeling cell surfaces and for probing the biological roles of particular carbohydrate 
structures. 
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